Generative models have been widely studied in computer vision. Recently, diffusion models have drawn substantial attention due to the high quality of their generated images. A key desired property of image generative models is the ability to disentangle different attributes, which should enable modification towards a style without changing the semantic content, and the modification parameters should generalize to different images. Previous studies have found that generative adversarial networks (GANs) are inherently endowed with such disentanglement capability, so they can perform disentangled image editing without re-training or fine-tuning the network. In this work, we explore whether diffusion models are also inherently equipped with such a capability. Our finding is that for stable diffusion models, by partially changing the input text embedding from a neutral description (e.g., "a photo of person") to one with style (e.g., "a photo of person with smile") while fixing all the Gaussian random noises introduced during the denoising process, the generated images can be modified towards the target style without changing the semantic content. Based on this finding, we further propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation. This entire process only involves optimizing over around 50 parameters and does not fine-tune the diffusion model itself. Experiments show that the proposed method can modify a wide range of attributes, with the performance outperforming diffusion-model-based image-editing algorithms that require fine-tuning. The optimized weights generalize well to different images. Our code is publicly available at https://github.com/UCSB-NLP-Chang/DiffusionDisentanglement.
translated by 谷歌翻译
Generative adversarial networks (GANs) have made great success in image inpainting yet still have difficulties tackling large missing regions. In contrast, iterative algorithms, such as autoregressive and denoising diffusion models, have to be deployed with massive computing resources for decent effect. To overcome the respective limitations, we present a novel spatial diffusion model (SDM) that uses a few iterations to gradually deliver informative pixels to the entire image, largely enhancing the inference efficiency. Also, thanks to the proposed decoupled probabilistic modeling and spatial diffusion scheme, our method achieves high-quality large-hole completion. On multiple benchmarks, we achieve new state-of-the-art performance. Code is released at https://github.com/fenglinglwb/SDM.
translated by 谷歌翻译
Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state. This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs. Different from the recent CNNs that focus on large dense kernels, InternImage takes deformable convolution as the core operator, so that our model not only has the large effective receptive field required for downstream tasks such as detection and segmentation, but also has the adaptive spatial aggregation conditioned by input and task information. As a result, the proposed InternImage reduces the strict inductive bias of traditional CNNs and makes it possible to learn stronger and more robust patterns with large-scale parameters from massive data like ViTs. The effectiveness of our model is proven on challenging benchmarks including ImageNet, COCO, and ADE20K. It is worth mentioning that InternImage-H achieved the new record 65.4 mAP on COCO test-dev. The code will be released at https://github.com/OpenGVLab/InternImage.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
本体论是在不同领域(包括法律领域)中知识表示的流行方法,描述逻辑(DL)通常用作其描述语言。为了根据基于DL不一致的法律本体论进行推理,目前的论文介绍了一个结构化的论证框架,尤其是根据ASPIC+在法律背景下推理,并将法律本体论转化为论证理论的公式和规则。从法律AI的角度来看,特别关注自动驾驶汽车的设计,我们表明,使用这种正式论证和基于DL的法律本体论的结合理论,可以根据不一致的本体论来获得可接受的断言,并根据本体的不一致以及传统的推理任务来获得DL本体也可以完成。此外,提出了对推理结果的解释的形式定义。
translated by 谷歌翻译
对卷积神经网络(CNN)的知识蒸馏(KD)进行了广泛的研究,以提高小型模型的性能。最近,Vision Transformer(VIT)在许多计算机视觉任务上取得了巨大的成功,而VIT的KD也需要实现。但是,除了基于输出logit的KD之外,由于巨大的结构间隙,其他基于特征的CNN基于特征的KD方法不能直接应用于VIT。在本文中,我们探讨了对VIT的基于特征的蒸馏方式。根据VIT中特征地图的性质,我们设计了一系列受控的实验,并为VIT特征蒸馏提供了三个实用指南。我们的一些发现甚至与CNN时代的实践相反。根据三个准则,我们提出了基于功能的方法Vitkd,从而为学生带来一致且相当大的改进。在ImagEnet-1K上,我们将DEIT微型从74.42%提高到76.06%,Deit-Small从80.55%提高到81.95%,而Deit-Base则从81.76%升至83.46%。此外,Vitkd和基于Logit的KD方法是互补的,可以直接使用。这种组合可以进一步提高学生的表现。具体而言,学生DEIT微小,小和基础分别达到77.78%,83.59%和85.41%。该代码可在https://github.com/yzd-v/cls_kd上找到。
translated by 谷歌翻译
知识蒸馏(KD)已广泛发展并增强了各种任务。经典的KD方法将KD损失添加到原始的跨熵(CE)损失中。我们尝试分解KD损失,以探索其与CE损失的关系。令人惊讶的是,我们发现它可以被视为CE损失和额外损失的组合,其形式与CE损失相同。但是,我们注意到额外的损失迫使学生学习教师绝对概率的相对可能性。此外,这两个概率的总和是不同的,因此很难优化。为了解决这个问题,我们修改了配方并提出分布式损失。此外,我们将教师的目标输出作为软目标,提出软损失。结合软损失和分布式损失,我们提出了新的KD损失(NKD)。此外,我们将学生的目标输出稳定,将其视为无需教师的培训的软目标,并提出了无教师的新KD损失(TF-NKD)。我们的方法在CIFAR-100和Imagenet上实现了最先进的性能。例如,以Resnet-34为老师,我们将Imagenet TOP-1的RESNET18的TOP-1精度从69.90%提高到71.96%。在没有教师的培训中,Mobilenet,Resnet-18和Swintransformer-tiny的培训占70.04%,70.76%和81.48%,分别比基线高0.83%,0.86%和0.30%。该代码可在https://github.com/yzd-v/cls_kd上找到。
translated by 谷歌翻译
近年来,卷积神经网络(CNN)在合成孔径雷达(SAR)目标识别方面表现出巨大的潜力。 SAR图像具有强烈的粒度感,并且具有不同的纹理特征,例如斑点噪声,目标优势散射器和目标轮廓,这些轮廓很少在传统的CNN模型中被考虑。本文提出了两个残留块,即具有多尺度接收场(RFS)的EMC2A块,基于多型结构,然后设计了有效的同位素体系结构深CNN(DCNN),EMC2A-net。 EMC2A阻止使用不同的扩张速率利用平行的扩张卷积,这可以有效地捕获多尺度上下文特征而不会显着增加计算负担。为了进一步提高多尺度功能融合的效率,本文提出了多尺度特征跨通道注意模块,即EMC2A模块,采用了局部的多尺度特征交互策略,而无需降低维度。该策略通过有效的一维(1D) - 圆形卷积和Sigmoid函数适应每个通道的权重,以指导全球通道明智的关注。 MSTAR数据集上的比较结果表明,EMC2A-NET优于相同类型的现有模型,并且具有相对轻巧的网络结构。消融实验结果表明,仅使用一些参数和适当的跨渠道相互作用,EMC2A模块可显着提高模型的性能。
translated by 谷歌翻译
用皮肤镜图像进行深度学习的黑色素瘤分类最近显示出在自动早期黑色素瘤诊断中的巨大潜力。然而,受到明显的数据失衡和明显的外部伪影的限制,即头发和尺子标记,从皮肤镜图像中提取的判别特征提取非常具有挑战性。在这项研究中,我们试图分别解决这些问题,以更好地表示病变特征。具体而言,基于GAN的数据增强(GDA)策略可与拟议的隐式脱糖(IHD)策略一起生成合成黑色素瘤阳性图像。其中,与头发相关的表示通过辅助分类器网络隐式分散,并反向发送到黑色素瘤 - 特征提取主链,以提供更好的黑色素瘤特异性表示学习。此外,为了训练IHD模块,头发的噪音还标记在ISIC2020数据集上,这使其成为第一个带有类似头发伪影的注释的大型皮肤镜数据集。广泛的实验证明了所提出的框架的优势以及每个组件的有效性。改进的数据集可在https://github.com/kirtsy/dermoscopicdataset上公开可用。
translated by 谷歌翻译